20 research outputs found
LMLFM: Longitudinal Multi-Level Factorization Machine
We consider the problem of learning predictive models from longitudinal data,
consisting of irregularly repeated, sparse observations from a set of
individuals over time. Such data often exhibit {\em longitudinal correlation}
(LC) (correlations among observations for each individual over time), {\em
cluster correlation} (CC) (correlations among individuals that have similar
characteristics), or both. These correlations are often accounted for using
{\em mixed effects models} that include {\em fixed effects} and {\em random
effects}, where the fixed effects capture the regression parameters that are
shared by all individuals, whereas random effects capture those parameters that
vary across individuals. However, the current state-of-the-art methods are
unable to select the most predictive fixed effects and random effects from a
large number of variables, while accounting for complex correlation structure
in the data and non-linear interactions among the variables. We propose
Longitudinal Multi-Level Factorization Machine (LMLFM), to the best of our
knowledge, the first model to address these challenges in learning predictive
models from longitudinal data. We establish the convergence properties, and
analyze the computational complexity, of LMLFM. We present results of
experiments with both simulated and real-world longitudinal data which show
that LMLFM outperforms the state-of-the-art methods in terms of predictive
accuracy, variable selection ability, and scalability to data with large number
of variables. The code and supplemental material is available at
\url{https://github.com/junjieliang672/LMLFM}.Comment: Thirty-Fourth AAAI Conference on Artificial Intelligence, accepte
How Do We Move: Modeling Human Movement with System Dynamics
Modeling how human moves in the space is useful for policy-making in
transportation, public safety, and public health. Human movements can be viewed
as a dynamic process that human transits between states (\eg, locations) over
time. In the human world where intelligent agents like humans or vehicles with
human drivers play an important role, the states of agents mostly describe
human activities, and the state transition is influenced by both the human
decisions and physical constraints from the real-world system (\eg, agents need
to spend time to move over a certain distance). Therefore, the modeling of
state transition should include the modeling of the agent's decision process
and the physical system dynamics. In this paper, we propose \ours to model
state transition in human movement from a novel perspective, by learning the
decision model and integrating the system dynamics. \ours learns the human
movement with Generative Adversarial Imitation Learning and integrates the
stochastic constraints from system dynamics in the learning process. To the
best of our knowledge, we are the first to learn to model the state transition
of moving agents with system dynamics. In extensive experiments on real-world
datasets, we demonstrate that the proposed method can generate trajectories
similar to real-world ones, and outperform the state-of-the-art methods in
predicting the next location and generating long-term future trajectories.Comment: Accepted by AAAI 2021, Appendices included. 12 pages, 8 figures. in
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence
(AAAI'21), Feb 202
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models
Augmented Language Models (ALMs) blend the reasoning capabilities of Large
Language Models (LLMs) with tools that allow for knowledge retrieval and action
execution. Existing ALM systems trigger LLM thought processes while pulling
observations from these tools in an interleaved fashion. Specifically, an LLM
reasons to call an external tool, gets halted to fetch the tool's response, and
then decides the next action based on all preceding response tokens. Such a
paradigm, though straightforward and easy to implement, often leads to huge
computation complexity from redundant prompts and repeated execution. This
study addresses such challenges for the first time, proposing a modular
paradigm ReWOO (Reasoning WithOut Observation) that detaches the reasoning
process from external observations, thus significantly reducing token
consumption. Comprehensive evaluations across six public NLP benchmarks and a
curated dataset reveal consistent performance enhancements with our proposed
methodology. Notably, ReWOO achieves 5x token efficiency and 4% accuracy
improvement on HotpotQA, a multi-step reasoning benchmark. Furthermore, ReWOO
demonstrates robustness under tool-failure scenarios. Beyond prompt efficiency,
decoupling parametric modules from non-parametric tool calls enables
instruction fine-tuning to offload LLMs into smaller language models, thus
substantially reducing model parameters. Our illustrative work offloads
reasoning ability from 175B GPT3.5 into 7B LLaMA, demonstrating the significant
potential for truly efficient and scalable ALM systems
Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off
Over-parameterization of deep neural networks (DNNs) has shown high
prediction accuracy for many applications. Although effective, the large number
of parameters hinders its popularity on resource-limited devices and has an
outsize environmental impact. Sparse training (using a fixed number of nonzero
weights in each iteration) could significantly mitigate the training costs by
reducing the model size. However, existing sparse training methods mainly use
either random-based or greedy-based drop-and-grow strategies, resulting in
local minimal and low accuracy. In this work, we consider the dynamic sparse
training as a sparse connectivity search problem and design an exploitation and
exploration acquisition function to escape from local optima and saddle points.
We further design an acquisition function and provide the theoretical
guarantees for the proposed method and clarify its convergence property.
Experimental results show that sparse models (up to 98\% sparsity) obtained by
our proposed method outperform the SOTA sparse training methods on a wide
variety of deep learning tasks. On VGG-19 / CIFAR-100, ResNet-50 / CIFAR-10,
ResNet-50 / CIFAR-100, our method has even higher accuracy than dense models.
On ResNet-50 / ImageNet, the proposed method has up to 8.2\% accuracy
improvement compared to SOTA sparse training methods
Towards Personalized Federated Learning via Heterogeneous Model Reassembly
This paper focuses on addressing the practical yet challenging problem of
model heterogeneity in federated learning, where clients possess models with
different network structures. To track this problem, we propose a novel
framework called pFedHR, which leverages heterogeneous model reassembly to
achieve personalized federated learning. In particular, we approach the problem
of heterogeneous model personalization as a model-matching optimization task on
the server side. Moreover, pFedHR automatically and dynamically generates
informative and diverse personalized candidates with minimal human
intervention. Furthermore, our proposed heterogeneous model reassembly
technique mitigates the adverse impact introduced by using public data with
different distributions from the client data to a certain extent. Experimental
results demonstrate that pFedHR outperforms baselines on three datasets under
both IID and Non-IID settings. Additionally, pFedHR effectively reduces the
adverse impact of using different public data and dynamically generates diverse
personalized models in an automated manner
Rethinking Data Distillation: Do Not Overlook Calibration
Neural networks trained on distilled data often produce over-confident output
and require correction by calibration methods. Existing calibration methods
such as temperature scaling and mixup work well for networks trained on
original large-scale data. However, we find that these methods fail to
calibrate networks trained on data distilled from large source datasets. In
this paper, we show that distilled data lead to networks that are not
calibratable due to (i) a more concentrated distribution of the maximum logits
and (ii) the loss of information that is semantically meaningful but unrelated
to classification tasks. To address this problem, we propose Masked Temperature
Scaling (MTS) and Masked Distillation Training (MDT) which mitigate the
limitations of distilled data and achieve better calibration results while
maintaining the efficiency of dataset distillation.Comment: ICCV 202
Gentopia: A Collaborative Platform for Tool-Augmented LLMs
Augmented Language Models (ALMs) empower large language models with the
ability to use tools, transforming them into intelligent agents for real-world
interactions. However, most existing frameworks for ALMs, to varying degrees,
are deficient in the following critical features: flexible customization,
collaborative democratization, and holistic evaluation. We present gentopia, an
ALM framework enabling flexible customization of agents through simple
configurations, seamlessly integrating various language models, task formats,
prompting modules, and plugins into a unified paradigm. Furthermore, we
establish gentpool, a public platform enabling the registration and sharing of
user-customized agents. Agents registered in gentpool are composable such that
they can be assembled together for agent collaboration, advancing the
democratization of artificial intelligence. To ensure high-quality agents,
gentbench, an integral component of gentpool, is designed to thoroughly
evaluate user-customized agents across diverse aspects such as safety,
robustness, efficiency, etc. We release gentopia on Github and will
continuously move forward
Neurogenesis Dynamics-inspired Spiking Neural Network Training Acceleration
Biologically inspired Spiking Neural Networks (SNNs) have attracted
significant attention for their ability to provide extremely energy-efficient
machine intelligence through event-driven operation and sparse activities. As
artificial intelligence (AI) becomes ever more democratized, there is an
increasing need to execute SNN models on edge devices. Existing works adopt
weight pruning to reduce SNN model size and accelerate inference. However,
these methods mainly focus on how to obtain a sparse model for efficient
inference, rather than training efficiency. To overcome these drawbacks, in
this paper, we propose a Neurogenesis Dynamics-inspired Spiking Neural Network
training acceleration framework, NDSNN. Our framework is computational
efficient and trains a model from scratch with dynamic sparsity without
sacrificing model fidelity. Specifically, we design a new drop-and-grow
strategy with decreasing number of non-zero weights, to maintain extreme high
sparsity and high accuracy. We evaluate NDSNN using VGG-16 and ResNet-19 on
CIFAR-10, CIFAR-100 and TinyImageNet. Experimental results show that NDSNN
achieves up to 20.52\% improvement in accuracy on Tiny-ImageNet using ResNet-19
(with a sparsity of 99\%) as compared to other SOTA methods (e.g., Lottery
Ticket Hypothesis (LTH), SET-SNN, RigL-SNN). In addition, the training cost of
NDSNN is only 40.89\% of the LTH training cost on ResNet-19 and 31.35\% of the
LTH training cost on VGG-16 on CIFAR-10